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Abstract 

A number of problems in statistical physics and computer science can 
be expressed as the computation of marginal probabilities over a Markov 
random field. Belief propagation, an iterative message-passing algorithm, 
computes exactly such marginals when the underlying graph is a tree. 
But it has gained its popularity as an efficient way to approximate them 
in the more general case, even if it can exhibits multiple fixed points and 
is not guaranteed to converge. In this paper, we express a new sufficient 
condition for local stability of a belief propagation fixed point in terms of 
the graph structure and the beliefs values at the fixed point. This gives 
credence to the usual understanding that Belief Propagation performs 
better on sparse graphs. 
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1 Introduction 

We consider in this work a Markov random field (MRF) on a finite graph with 
local interactions, on which we want to compute marginal probabilities. The 
structure of the underlying model is described by a set of discrete variables 
X = {xi^i S V} S {!,..., g}"^, where the set V of variables is linked together by 
so-called "factors" which are subsets a C V of variables. If F is this set of factors, 
we consider the set of probability measures of the form 

p{^)^\{c^,{x,)\{M^a), (1) 

iev aeF 

where = {xi,i G a}. In what follows, a factor will be indifferently considered 
as a node in a graph or as a set of variables. In this respect, z 6 a ca be read as 
"the variable node i is connected to the factor node a." 

F together with V define the factor graph Q , which is an undirected 
bipartite graph. We will also assume that p is strictly positive, which is to say 
that the MRF exhibits no deterministic behavior. The set E of edges contains 



all the pairs {a,i) G F x V such that z 6 a. We denote da (resp. di) the degree of 
the factor node a (resp. of the variable node i). 

Exact procedures for computing marginal probabilities of p generally face 
an exponential complexity and one has to resort to approximate procedures. In 
computer science, the belief propagation (BP) algorithm ^ is a message passing 
procedure that allows to compute efficiently exact marginal probabilities when 
the underlying graph is a tree. When the graph has cycles, it is still possible to 
apply the procedure, which converges with a rather good accuracy on sufficiently 
sparse graphs. However, there may be several fixed points, corresponding to 
stationary points of the Bethe free energy [Hj. Stable fixed points of BP are 
local minima of the Bethe free energy [H [TB] . 

The question of convergence of BP has been addressed in a series of works [TUl 
[31IH], which establish sufficient conditions on the MRF under which BP converges 
to a unique fixed point. However, cases with multiple fixed points can be used 
to encode different patterns [5] and have not been studied yet. Wainwright [T5] 
suggests that, facing the joint problem of parameter estimation and prediction 
in a MRF, estimation under the Bethe approximation and prediction using BP 
is an efficient setting. This consist in choosing ([T]) such that one fixed point is 
known. We propose here to change the viewpoint and, instead of looking for 
conditions ensuring a single fixed point, examine the local properties of each of 
them. Theorem 14.11 gives a sufficient condition for local stability of fixed points 
which quantifies the known fact that BP performs better in sparser graphs. 

The paper is organized as follows: the BP algorithm and its various nor- 
malization strategies are defined in Section [51 Section [3] exhibits cases where 
convergence of messages is equivalent to convergence of beliefs, allowing us to 
consider only message convergence. Finally in Section [5J we provide some suf- 
ficient conditions for local stability of BP fixed points. Section [5] concludes the 
paper. 

2 The belief propagation algorithm 

The belief propagation algorithm [S] is a message passing procedure, whose 
output is a set of estimated marginal probabilities, the beliefs 6a (xa) (including 
single nodes beliefs bi{xi)). The idea is to factor the marginal probability at a 
given site as a product of contributions coming from neighboring factor nodes, 
which are the messages. With definition ([1]) of the joint probability measure, 
the updates rules read: 

ma^i{xi) ^^Tpai^a) J]^ Hj^aixj), (2) 

rii^aixi) = (l)i{xi) Y\_ ma'^iixi), (3) 
a' 3i^a' 

where the notation should be understood as summing from I to q all the 

— ^a\i 

variables Xj, j G a cY, j y^i. At any point of the algorithm, one can compute 
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the current beliefs as 

bi{xi) ] (l)^{xi)T\ma^t{xi), (4) 

ba{^a)= . ■4'a{Xa)Y\^t^a{Xt), (5) 

where Zi{m) and Za{m) are the normalization constants that ensure that 

J2b^ix^)^l, ^5a(xJ = l. (6) 

These constants reduce to 1 when Q isa tree. When the algorithm has converged, 
the following compatibility condition holds : 

^ba{-X.a) ^bi{xi). (7) 

In practice, the messages are often normalized so that 

9 

ma^i{xi) = 1. (8) 

However, the possibilities of normalization are not limited to this setting. Con- 
sider the mapping 



A normalized version of BP is defined by the update rule 



(9) 



ma^^{x^) ^ ' . 10 

Zai{m) 

where Zai{rh) is a constant that depends on the messages and which, in the 
case of dS]), reads 

^aT^^M = Ee„.,.(m). (11) 

x^l 

Following [TT], it is worth noting that (|2l3p can be rewritten as 

Zaim)biia{xi) . . 

ma^i{Xi) i ' ma^i{Xi), (12) 

Z^{m)bi{xi) 

where we use the convenient shorthand notation bj^^^{xi) X^x \^ ^a(xa)- This 
suggests a different type of normalization, used in particular by |4j , namely 

(13, 

which leads to the simple update rule 

bilai^i) 

rha^i{xi) i — —ma^i{xi). (14) 

bi(Xi) 
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3 Belief and message dynamic 



At each step of the algorithm, using (|3|) and (0), we can compute the current 
behefs 6^"^ and 5a"' associated with the message to'"^ . The sequence m'"' will 
be said to be "5-convergent" when the sequences 6*"' and foa"' converge. This is 
the convergence that is interesting in practice. The term "m-convergence" will 
be used to refer to convergence of the sequence to'"^ itself. Since the algorithm 
is expressed in terms messages, m-convergence obviously implies 6-convergence, 
but the opposite is not generally true. The aim of this section is to provide 
a broad class of normalization policies such that b- and m-convergence, are 
equivalent in order to focus on m-convergence in the next section. 

As pointed out in [5], different sets of messages correspond to the same set 
of beliefs. The following lemma makes this explicit. 

Lemma 3.1. Two set of messages m and m' lead to the same beliefs if and 
only if there is a set of strictly positive constants Cai such that 

Proof. The direct part of the lemma is trivial. Concerning the other part, we 
have from (g]) and (O 



ba{ y.a)Za{m) 

/'a(Xa) 

bi{xi)Zi{m) 

(l>i{Xi) 



a3i 



Assume the two vectors of messages m and m' lead to the same set of beliefs b 
and write ma^i{xi) = Cai{xi)ra'^^^(xi). Then, from the relation on bi{xi), the 
vector c satisfies 

Moreover, we want to preserve the beliefs ba- Using (|15p . we have 

Caj{Xj) = I I — ; -, r = „ , . \\Vi= ^a, 16) 

m'^Axi) Za(m) 

since vt (resp. Va) does not depend on the choice of (resp. Xq), (|16p implies 
the independence oicai{xi) with respect to Xi. Indeed, if we compare two vectors 
Xa and x^ such that, for all i G a\j , x'j^ — Xi, but x'^ ^ xj, then Caj{xj) — Caj{x'j), 
which concludes the proof. ■ 

Following an idea developed in [S], it is natural to look at the behavior of 
BP in a quotient space corresponding to the invariance of beliefs. First, we 
will introduce a natural parametrization for which the quotient space is just a 
vector space. Then we will show that, in terms of 5-convergence, the effect of 
normalization is null. Let us consider the following change of variables: 

l^a^i{Xi) = logrria^iixi), 
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so that the plain update mapping ^ becomes 



We have ^eAf = rI^I^'i and we define the vector space W which is the hnear 
span of the foUowing vectors {cai G ■^f}(ai)€¥. 

{Sai)cj,Xj = l{a=c,i=j}- 

The invariance set of the behefs corresponding to /i is simply the affine space 
^ + W fLemma l3.ip . So fj.^^^ is 6-convergent iff n^^^ converges in the quotient 
space J\f\yV, which is simply a vector space [3]. We use the notation [x] for the 
canonical projection of x on Af\W. 

The normalization of fi leads to fi + w with some w £ W. Indeed we have 

j€a\i bBj 

which can be summed up by [A(/i + W)] ~ [A(/i)], since I G W. This means that 
normalization plays no role in M\yV and implies the following proposition. 

Proposition 3.2. The dynamic, i.e. the value of the normalized beliefs at each 
step, of the BP algorithm with or without normalization is exactly the same. 

We will come back to this vision in terms of quotient space in Section 14.31 
and we now exhibit a broad class of normalizations for which 6-convergence and 
TO-convergence are equivalent. 

Definition 3.3. A normalization Zai is said to be positive homogeneous when 
it is of the form Zai = Nai o Qai, with Nai : ^ M+ a positive homogeneous 
function of order 1 satisfying 

Na^{Xma^^) - AiV„(ma^,),VA > 0. (17) 
Nai{ma^i) ^0 <^==^ nia^i^O. (18) 

A particular family of positive homogeneous normalizations is obtained when 
Nai is a norm on M'^. This is the case the normalization Z'^^^^ (fTTj) . It is 
actually not necessary to have a proper norm: the scheme used in [13j amounts 
to Zl-{m) = Qai,i{m). 

Note however that Z^f [T^ is not part positive homogeneous, and therefore 
the results of this section do not apply to this case. 

Proposition 3.4. For any positive homogeneous normalization Zai with con- 
tinuous Nai, m-convergence and b-convergence are equivalent. 

Proof. Assume that the sequences of beliefs are such that 6a"' ba and 6^"' — > hi 
as oo. The idea of the proof is to first express the normalized messages m^a-n 



?/'a(Xa)exp(^ ^ ^flb^j{xj) 



Xa\i j€a\ibBj 
b^a 
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at each step in terms of these behefs, and then to conclude by a continuity 
argument. Starting from a rewrite of 

one obtains by recombination 

where an arbitrary variable i e a has been singled out and 

Assume now that 'X-a\i is fixed and consider K^"-' (x^j^^j) = ii"^"' (x^j'y j ; •) as a 
vector of K''. Normalizing each side of the equation with a positive homogeneous 
function Nai yields 



Actually Nai Ifn"^}^^ = 1, since fn"^\^^ has been normalized by Nai and therefore 

~ (n) / \ _ ^ai {^a\i'T^i) 

This concludes the proof, since m^^Xi been expressed as a continuous func- 
tion of b\"^ and b^a\ and therefore it converges whenever the beliefs converge. 



4 Local stability of BP fixed points 

The question of convergence of BP has been addressed in a series of works 
[TUl [S] H] which establish conditions and bounds on the MRF coefficients for 
having global convergence. In this section, we change the viewpoint and, instead 
of looking for conditions ensuring a single fixed point, we examine the local 
properties each fixed point. 

In what follows, we are interested in the local stability of a message fixed 
point TO with associated beliefs b. It is known that a BP fixed point is locally 
attractive if the Jacobian of the relevant mapping (0 or its normalized version) 
at this point has all its eigenvalues of modulus strictly smaller than 1 and 
unstable when, at least, one eigenvalue has a modulus strictly greater than 1. 
The characterization of the local stability relies on two ingredients. The first 
one is the oriented line graph L{Q) based on Q, whose vertices are the elements 
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of E, and whose oriented links relate ai to a'j iijGana',j^i and a' ^ a. The 
corresponding 0-1 adjacency matrix A is defined by the coefficients 



= '^{j€ana',j^i,a'j^a}- (19) 

The second ingredient is the set of stochastic matrices B^^"'^\ attached to 
pairs of variables (i, j) having a factor node a in common, and which coefficients 
at row fc, column ^ (in {1, . . . , g}^) are the conditional beliefs 



(iaj) dot _ _ _ ^aj'^a) 

bi{xi) 



Xa\{j,j} 

"J 

4.1 The unnormalized algorithm 

Let us first consider briefly the unnormalized algorithm (|2l3p . Using the repre- 
sentation the Jacobian reads at this point: 

a\{i,3} 



bi{xi 



Therefore, the Jacobian of the plain BP algorithm is — using a trivial change 
of variable — similar to the matrix J defined, for any pair {ai,k) and {a'j,£) of 
E X {1, . . . , g} by the elements 

ja'j,e dot ,(iai) .a'j 

This expression is analogous to the Jacobian encountered in 8J ■ It is interesting 
to note that it only depends on the structure of the graph and on the belief 
corresponding to the fixed point. Since Q is a singly connected graph, it is clear 
that A is an irreducible matrix. To simplify the discussion, we assume in the 
following that J is also irreducible. This will be true as long as the V' are always 
positive. 

It can be shown [7] that the spectral radius of J is always larger than 1, 
except in some special cases where the number of cycles in the graph is less 
than 1. Wc will not develop this point here. 

4.2 Positive homogeneous normahzation 



We have seen in Proposition [23] that all the continuous positively homogeneous 
normalizations make m-convergence equivalent to 6-convergence. Since they all 
share the same properties, we look at the particular case of Z^^f^^{m), which is 
both simple and differentiable. The coefficients of the Jacobian matrix at fixed 
point m with beliefs b read 



d 
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which is similar to the matrix J of general term 



Jaj,i dct 



q 



which can be summarized by J = (I — M)J, with I the identity matrix and M: 

Ki,k = "^a'^J (^)l{a=b,«=j}- 

The presence of the messages in the Jacobian J seems to complicate the study, 
but in fact the spectrum of J does not depend on the messages themselves. It 
is known (see e.g. \2^) that it is possible to chose the functions (j) and i/i as 

Mx^)'^b^ix^), ^„ (x, ) , (21) 

in order to obtain a prescribed set of beliefs 6 at a fixed point. Indeed, BP will 
admit a fixed point with ba = ba and bi = bi when ma^i{xi) = 1. Since only 
the beliefs matter here, without loss of generality, we restrict ourselves in the 
remainder of this section to the functions (|2ip . Then, from (|20|) . the definition 
of J rewrites 



Ja'j,e dof 



For each connected pair of variable nodes, we associate to the stochastic 
kernel B'^^"^^ a combined stochastic kernel K^"'^^ ls?B(^ai)sOa»). in the follow- 
ing we consider hi as a vector of W. Since biB^'^"'^^ — bj, b^ is the invariant 
measure associated to K: 

and K^^"'^^ is reversible, since 



Let /i2*"^'' be the second largest eigenvalue of if (*"^) and let 



= max y 1 4"^^ I 



The combined effect of the graph and of the local correlations on the stability 
of the reference fixed point is stated as follows. 

Theorem 4.1. Let Ai be the Perron eigenvalue of the matrix A 

(i) if Xi 1-12 < 1; the fixed point of BP scheme il(A\ll\) associated to b is stable. 

(ii) If the system is homogeneous ('_b(*"-') ^ B independent of i, j and a), 
Ai/i2 < 1 is also a necessary condition. 
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Condition (i) combines the effects of a term (/i2) which depends on the local 
dependence structure of the given fixed point with another one (Ai ) character- 
istic of the underlying graph. For example, in the homogeneous case, if G has 
uniform degrees da and di, the condition reads 



In the case of binary variables fJ-2 — det(i^(*"-'^), which is just the square of 
Pearson's correlation coefficient between Xi and Xj, which in general depends 
on the factor a. The condition (i) of Theorem 14.11 thus is an upper bound on 
the correlations between variables at stable fixed points. 

In order to prove part (i) of the theorem, we will consider a local norm on 
R9 attached to each variable node i, 



the local average of a; G R'' w.r.t bi. For convenience, we will also consider the 
somewhat hybrid global norm on M^^l^l 



where tt is the right Perron vector of A, associated to Ai. We have the following 
useful inequality: 

Lemma 4.2. For any {x'''\x^^^) G x Ri, such that {x('^)b, = and x[^^ bj{C) = 



^l2{da-l)[d,-l)<l. 




k=l 



k=l 




(ai)eE 




{x^^\^0 and ||x(^)||^^,<Mf^"^||^»||t 



Proof. By definition of the kernels iC^*"^), we have 




Since K^^"'^^ is reversible, Rayleigh's theorem implies 




which concludes the proof. 



To deal with iterations of J, we express it as a sum over paths. 




where B , . 



is an average stochastic kernel, 




(22) 
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^aAa'j represents the set of directed path of length n joining ai and a'j on L{Q) 



and its cardinal is precisely jT^" —{^^)a,i ■ 
Lemma 4.3. For any {x^^'^x^"' eM.'^'^, such that (x(°*))6^ =0 and 



Ike 

k 



the following inequality holds 



Proof. Let x^"' ^\'-f) be the contribution to x^" corresponding to the path 
7 g ^^aia'j ■ Using Lemma 14.21 recursively yields for each individual path 

and, owing to triangle inequality, 

' o.i,a'jl gp(") 



Proof of Theorem \4-l\ Let v and v' two vectors with v' — vj" = v(I — M) J", 
since JM — 0. Recall that the effect of (I — M) is to first project on a vector 
with zero local sum, — M))^^ ^ = 0, Vi E V, so we assume directly v of 

the form 

Vai,k^Xai,kbi{k), with {Xai)b^=0. 

As a result, v' — vJ" is of the same form. Let x'^,. ^ = v'^,. ^/bj{i). We have 

ii-'iu,^< E E 

(a'j)6E (ai)eE 

with y^"^^ bj{£)=J2k^cit,kk{k){B^^^^^,.)j^^. Applying Lemma|13]to y^?]^ yields 

ll^'IU,b< E E i^X>2Mh-^>2M^,b, 

(a'j)eE (ai)eE 

since tt is the right Perron vector of A. This ends the proof of (i). 

For (ii), when the system is homogeneous, J is a tensor product of A with 
B, and its spectrum is therefore the product of their respective spectra. ■ 

The quantity /X2 is representative of the level of mutual information between 
variables. It relates to the spectral gap (see e.g. [I] for geometric bounds) of each 
elementary stochastic matrix while Ai encodes the statistical properties 

of the graph connectivity. The bound Ai/i2 < 1 could be refined when dealing 
with the statistical average of the sum over path in (|22p which allows to define 
H2 as 

^ 1 

ai,a' j 
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4.3 Local convergence in quotient space 7V^\>V 

We make here the connexion with the notion of local stability in the quotient 
space TV \ W of Section [31 Trivial computations yield VA = J. In terms of 
convergence in A/'\yV, the stability of a fixed point is given by the projection 
of J on the quotient space Af\W and we have [5] : 

[J] [VA] = V[A] 

The normalization 2^™^^^ is in fact just a way to compute [J] by applying a 
projection I — M to J. Since ker(I — M) — W, it is just a quotient map from 
JV to A/'\VV. For any differentiablc positively homogeneous normalization, we 
obtain the same result, the Jacobian of the corresponding normalized scheme is 
the projection of J on A/" \ W, through some quotient map. 

5 Conclusion 

We provided here, for the first time at our knowledge, an explicit sufficient con- 
dition for local stability of a belief propagation fixed point, instead of sufficient 
conditions for convergence to a unique fixed point. This condition is coherent 
with the usual understanding of BP convergence; when the connectivity of both 
Q and L{Q) increases, Ai is also increasing since A is increasing. So Theorem l4.1l 

imposes that the level of mutual information /i2*""''' between variables i and j at 
a stable fixed point decreases. Reciprocally, the sparser Q is, the bigger mutual 
information can be. This somewhat explains why BP performs better on sparse 
graphs: the amount of admissible mutual information between variables at a 
stable fixed point is larger on a sparse graph than on a dense one. 
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